Density Estimation and Visualization for Data Containing Clusters of Unknown Structure
نویسنده
چکیده
A method for measuring the density of data sets that contain an unknown number of clusters of unknown sizes is proposed. This method, called Pareto Density Estimation (PDE), uses hyper spheres to estimate data density. The radius of the hyper spheres is derived from information optimal sets. PDE leads to a tool for the visualization of probability density distributions of variables (PDEplot). For Gaussian mixture data this is an optimal empirical density estimation. A new kind of visualization of the density structure of high dimensional data set, the P-Matrix is defined. The P-Matrix for a 79dimensional data set from DNA array analysis is shown. The P-Matrix reveals local concentrations of data points representing similar gene expressions. The P-Matrix is also a very effective tool in the detection of clusters and outliers in data sets.
منابع مشابه
Optimal density estimation in data containing clusters of unknown structure
A method for measuring the density of data sets that contain an unknown number of clusters of unknown sizes is proposed. This method, called Pareto Density Estimation (PDE), uses hyper spheres to estimate data density. The radius of the hyper spheres is derived from information optimal sets. PDE leads to a tool for the visualization of probability density distributions of variables (PDEplot). F...
متن کاملMoment Inequalities for Supremum of Empirical Processes of U-Statistic Structure and Application to Density Estimation
We derive moment inequalities for the supremum of empirical processes of U-Statistic structure and give application to kernel type density estimation and estimation of the distribution function for functions of observations.
متن کاملSyllable structure in Old, Middle and Modern Persian: A contrastive analysis
Evolution of languages has always been of interest to linguists. In this paper we study the natural progress of the syllable structure from Old Persian (O.P) to Middle Persian (Mi.P) and up to the Modern Persian (Mo.P). For this purpose all the words containing consonant sequences are collected from specific sources of each of these languages, and then analysed according to the syllab...
متن کاملImprovement of density-based clustering algorithm using modifying the density definitions and input parameter
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...
متن کاملPareto Density Estimation: A Density Estimation for Knowledge Discovery
Pareto Density Estimation (PDE) as defined in this work is a method for the estimation of probability density functions using hyperspheres. The radius of the hyperspheres is derived from optimizing information while minimizing set size. It is shown, that PDE is a very good estimate for data containing clusters of Gaussian structure. The behavior of the method is demonstrated with respect to clu...
متن کامل